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Abstract. We use ESX, a product of Information Acumen Corporation, to per- 
form unsupervised learning on a data set containing 797 gamma-ray bursts taken 
from the BATSE 3B catalog [5]. Assuming all attributes to be distributed logNor- 
mally, Mukherjee et al. [6] analyzed these same data using a statistical cluster anal- 
ysis. Utilizing the logarithmic values for T90 duration, total fluence, and hardness 
ratio HR321 their results showed the instances formed three classes. Class I contained 
long/bright/intermediate bursts, class II consisted of short /faint /hard bursts and class 
III was represented by intermediate/intermediate/soft bursts. 

When ESX was presented with these data and restricted to forming a small number 
of classes, the two classes found by previous standard techniques [1] were determined. 
However, when ESX was allowed to form more than two classes, four classes were 
created. One of the four classes contained a majority of short bursts, a second class 
consisted of mostly intermediate bursts, and the final two classes were subsets of the 
Class I (long) bursts determined by Mukherjee et al. We hypothesize that systematic 
biases may be responsible for this variation. 



INTRODUCTION 

Induction-based learning [4] attempts to extract interesting patterns from data. 
These patterns form concept classes with each class containing data instances. 
When the induction is unsupervised, the learning model has no a priori class knowl- 
edge. Rather, the learning algorithm uses one or more statistical or symbolic (ma- 
chine learning) evaluation functions to cluster instances into concept classes. 

Mukherjee et al. [6] performed a statistical cluster analysis on a data set con- 
taining 797 gamma-ray bursts taken from the BATSE 3B catalog [5]. Assuming 
all attributes to be distributed logNormally, and utilizing the logarithmic values 
for T90 duration, total fluence, and hardness ratio HR321 their results showed the 
instances formed three classes. Class I contained long/bright /intermediate bursts, 
class II consisted of short /faint /hard bursts and class III was represented by inter- 
mediate/intermediate/soft bursts. Table 1 shows the mean and standard deviation 



values for the three classes. Table 2 ofTers a best defining rule for each class, as 
determined by ESX [7]. The rule for class I bursts indicates that 82.72% of the 

bursts in this class have a log T90 value between .70 and 2.66 and a log Fluence 
between -5.77 and -3.11. The rule also shows that we can be at least 97% confident 
that a burst with these characteristics is a class 1 burst. Table 2 shows that the 
class III rule does not cover its instances as well as the rules for classes I and II. 

TABLE 1. Mean and Standard Deviations for the Classes found by Mukherjee et al. 



(1998j 





Class I 
Long 


Class II 
Short 


Class III 
Intermediate 


Domain 


Number of Bursts 


486 


203 


107 


796 


Log T50 (mean) 


L13 


-0.80 


0.33 


0.53 


(sd) 


0.45 


0.41 


0.26 


0.93 


Log T90 (mean) 


L55 


-0.42 


0.71 


0.93 


(sd) 


0.40 


0.44 


0.32 


0.94 


Log Fluence (mean) 


-5.21 


-6.37 


-6.11 


-5.63 


(sd) 


0.59 


0.57 


0.37 


0.77 


Log P256 (mean) 


0.21 


0.14 


-0.08 


0.15 


(sd) 


0.48 


0.38 


0.33 


0.45 


^ Log HR32 (mean) 


0.20 


0.51 


0.09 


0.26 


(sd) 


0.27 


0.27 


0.40 


0.33 


Log HR321 (mean) 


0.43 


0.70 


0.35 


0.49 


(sd) 


0.23 


0.26 


0.39 


0.30 



Attributes log T50, log P256, and log HR32 were not used in the final analysis since 
each had a high correlation with its respective counterpart (log T90, log fluence, and log 
H321). 



TABLE 2. Representative ESX Rules for the Three 
Classes found by Mukerjee et al. (1998) 



Class I 
(Long Bursts) 


0.70 <= logT90 <= 2.66 

and -5.77 <= log Fluence <= -3.11 

:rule accuracy 97.34% 

:rule coverage 82.72% 


Class II 
(Short Bursts) 


-1.55 <=logT90 <= 0.41 
:rule accuracy 90.87% 
:rule coverage 98.03% 


Class III 
(Intermediate 
Bursts) 


0.46 <= logT90 <= 0.96 
and 0.17 <= logT50 <= 0.55 
:rule accuracy 79.17% 
:rule coverage 53.27% 



In this paper we use ESX [7], a machine learning model and product of Informa- 
tion Acumen Corporation, to perform unsupervised learning on these same data 
for the purpose of comparative analysis. We chose ESX for this research since 
ESX explains its behavior has been shown to perform well in several real-world 
environments [7]. 



METHOD 



The machine learning component of ESX is an induction-based sequential learn- 
ing model that creates a concept hierarchy [2] from a set of input instances. ESX 
uses knowledge contained in its concept hierarchy to generate a set of production 
rules to help define and explain what has been discovered. Supervised as well as 
unsupervised learning is supported. 

ESX accepts data in the form of instances represented in attribute- value format. 
When learning is unsupervised, ESX takes one of two possible actions for each 
newly presented instance: (1) Place the new instance into an existing cluster, or 
(2) create a new conceptual cluster containing the instance as its only member. 

In addition, ESX allows the user to set a learning parameter so as to encourage 
or discourage the creation of new clusters. For a given domain, a best value for 
this parameter can be determined experimentally. 

RESULTS 

For our first experiment, we set the ESX learning parameter so as to restrict the 
formation of new classes. As a result, ESX clustered the data into the two classes 
found by previous standard techniques [1] . Table 3 shows a representative rule for 
each class. Notice that both clusters are well-defined. 



TABLE 3. Representative Rules Taken from 

the Two Class ESX Clustering 



Class I 
(Long Bursts) 


0.54 <= logTQO <= 2.66 
:rule accuracy 98.03% 
:rule coverage 96.99% 


Class II 
(Short Bursts) 


-L55 <= logT90 <= 0.38 
:rule accuracy 98.14% 
:rule coverage 90.95% 



For our second experiment, we allowed ESX to form a best set of three or more 
clusters. The results of this experiment showed the formation of four clusters. 
One of the four clusters contained a majority of intermediate bursts (class 1); a 
second cluster consisted of mostly short bursts (class 2). The remaining two clusters 
(classes 3 and 4) were subsets of the Mukherjee class I bursts. The class mean and 
standard deviation values for each of the six burst attributes are shown in Table 4. 

Table 5 offers representative rules for each of the four clusters. Figures 1 and 2 
as well as Table 4 indicate that class 3 contains mostly long/soft bursts and class 
4 contains long/bright bursts. The following rule represents a covering rule for the 
cluster formed by combining the class 3 and class 4 bursts. 

1.19 <= logTQO <= 2.66 
:rule accuracy 90.26% 
:rule coverage 92.68% 



TABLE 4. Mean and Standard Deviations for the ESX Four Class Clustering 





Class 1 


Class 2 


Class 3 


Class 4 


T~)nTTifiiii 




Tntprm pd i a t,p 


Short 


TjOTiff /Soft 






N^nmV>f»r of Rnrsts 


182 


205 


195 


215 


796 




0.44 


-0.78 


1.27 


1.18 


0.53 


fsd) 


0.44 


0.44 


0.37 


0.44 


0.93 


Loe T90 fmean") 


0.85 


-0.41 


1.67 


1.62 


0.93 




0.37 


0.46 


0.32 


0.38 


0.94 




-5.87 


-6.36 


-5.50 


-4.84 


-5.63 


(sd) 


0.45 


0.59 


0.37 


0.61 


0.77 


Log P256 (mean) 


0.04 


0.13 


-0.07 


0.48 


0.15 


(sd) 


0.43 


0.38 


0.22 


0.51 


0.45 


Log HR32 (mean) 


0.11 


0.54 


-0.03 


0.38 


0.26 


(sd) 


0.27 


0.30 


0.24 


0.16 


0.33 


Log HR321 (mean) 


0.36 


0.73 


0.24 


0.59 


0.49 


(sd) 


0.27 


0.29 


0.21 


0.14 


0.30 



TABLE 5. R<'i)r('S('ii1 ali\"(' Rul(\s Taken from the Four Class ESX Cluslt^riiig 



Class 1 
(Intermediate) 


0.29 logT90 <= 1.20 
:rule accuracy 75.00% 
:rule coverage 74.18% 


0.21 logT50 <= 0.63 
and 0.29 <= logT90 <= 1.09 
:rule accuracy 89.02% 
:rule coverage 40.11% 


Class 2 
(Short) 


-1.55 <=logT90<= 0.42 

and -1.92 <= logT50 <= -0.02 

:rule accuracy 93.20% 
:rule coverage 93.66% 


-7.80 <= logFluence <= -6.63 
and -1.92 <= logT50 <= -0.02 

:rulc accuracy 95.95% 
:rule coverage 34.63% 


Class 3 
(Long/Soft) 


1.19 <= logT90 <= 2.66 
:rule accuracy 90.26% 
:rule coverage 92.68% 


0.02 <= logHR321 <= 0.08 

:rulc accuracy 77.36% 
:rule coverage 21.03% 


Class 4 
(Long/Bright) 


-4.85 <= logFluence <= -3.11 

:rulc accuracy 90.91% 
:rule coverage 51.16% 



CONCLUSIONS 

We used ESX to cluster data about 797 gamma ray bursts. When restricted to 
forming a small number of classes, two classes were determined. However, when 
allowed to form more than two classes, four classes were created. Two of the clusters 
were similar to the class II and class III bursts determined by Mukherjee et al. [6] . 
Taken together, the two remaining clusters represent the class I Mukherjee et al. 
bursts. ESX differentiated the class I bursts by brightness and hardness. The 
separation of long bursts into two classes may be due in part to the fact that ESX 
makes no a priori assumptions about data distribution. 

We hypothesize that systematic effects may cause some class I bursts to take on 
class III characteristics [3]. Systematic biases may explain why class I bursts have 
been separated into two groups by ESX. Our future work will focus on testing these 
hypotheses with the help of additional induction-based techniques. 
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FIGURE 1. 3/21 Hardness Ratio vs. ch 2 + 3 fluence 
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FIGURE 2. 3/21 Hardness Ratio vs. T90 duration 
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